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ABSTRACT 


Knowledge tracing (KT), the task of tracking the knowledge 
state of each student over time, has been assessed actively 
by artificial intelligence researchers. Recent reports have 
described that Deep-IRT, which combines Item Response 
Theory (IRT) with a deep learning model, provides superior 
performance. It can express the abilities of each student 
and the difficulty of each item such as IRT. However, its 
interpretability and applicability remain limited compared 
to those of IRT because the ability parameter depends on 
each item. Namely, the ability estimate for the same student 
and time might differ if the student attempts a different 
item. To overcome those difficulties, this study proposes a 
novel Deep-IRT model that models a student response to an 
item by two independent networks: a student network and 
an item network. Results of experiments demonstrate that 
the proposed method improves prediction accuracy and the 
interpretability of earlier KT methods 
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1. INTRODUCTION 


Recently, along with the advancement of online education, 
Knowledge Tracing (KT) has attracted broad attention for 
helping students to learn effectively by presenting optimal 
problems and a teacher’s support [5, 14, 16, 22, 23, 24, 37, 
39, 43, 45, 46]. Important tasks of KT are tracing the stu- 
dent’s evolving knowledge state and discovering concepts 
that the student has not mastered based on the student’s 
prior learning history data. Furthermore, predicting a stu- 
dent’s performance (correct or incorrect responses to an un- 
known item) accurately is important for adaptive learning. 
Many researchers have developed various methods to solve 
KT tasks. Methods for KT are divisible into probabilistic 
approaches and deep-learning approaches. 


For example, Bayesian Knowledge Tracing (BKT), a tradi- 
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tional and well known probabilistic model for KT [1, 5, 8, 14, 
16, 22, 23, 26, 45], employs a Hidden Markov Model to trace 
a process of student ability growth. It predicts the proba- 
bility of a student responding to an item correctly. Item 
Response Theory (IRT) [3, 34, 35], which is used in the test 
theory area [10, 11, 12, 13, 28, 33, 36], has come to be used 
for KT [6, 40]. Actually, IRT predicts a student’s correct 
answer probability to an item based on the student’s latent 
ability parameter and item characteristic parameters. 


Actually, a learning task is associated with multiple skills. 
Students must master the knowledge of multiple skills to 
solve a task. However, BKT and IRT have a restriction by 
which they express only uni-dimensional ability. 


To overcome the limitations, Deep Knowledge Tracing (DKT) 
[24] was proposed as the first deep-learning-based method. 
DKT employs Long short - term memory (LSTM) [27] to 
predict a student’s performance. LSTM relaxes the restric- 
tions of skill separation and binary state assumptions. How- 
ever, the hidden states include a summary of the past se- 
quence of learning history data in LSTM. Therefore, DKT 
does not explicitly treat the student’s ability of each skill. 


To improve the DKT performance, various deep-learning- 
based methods have been proposed [2, 4, 17, 19, 29, 30, 
31, 38, 42, 44]. Especially, the dynamic key-value memory 
network (DKVMN) was developed to exploit the relations 
among underlying skills and to trace the respective knowl- 
edge states [46]. To trace student ability, DKVMN uses a 
Memory-Augmented Neural Network and attention mecha- 
nisms. Furthermore, to improve the explanatory capabilities 
of the parameters, Deep-IRT was proposed by combining 
DKVMN with an IRT module [43]. In fact, Deep-IRT can 
estimate a student’s ability and an item’s difficulty just as 
standard IRT models can. However, the ability parameter of 
the Deep-IRT depends on each item characteristic because 
it implicitly assumes that items with the same skills are 
equivalent. The assumption does not hold when the item 
difficulties for the same skills differ greatly. Items for the 
same skills which are not equivalent hinder interpretation of 
a student’s ability estimate. 


Most recently, Gosh et al. (2020) proposed attentive knowl- 
edge tracing (AKT) [7], which incorporates a forgetting func- 
tion of past data to attention mechanisms. Additionally, 
they indicated a problem by which earlier KT methods as- 
sumed that items with the same skills are equivalent. To re- 
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solve that difficulty, they employed both items and skills as 
inputs. The predictive accuracy of a student’s performance 
was improved by AKT. However the interpretability of the 
parameters is limited because it cannot express a student’s 
ability transition of each skill. 


Earlier studies have tackled to develop deep-learning-based 
methods to give parameter interpretability similarly to IRT 
models, but those studies have not achieved it for student 
ability parameters, which are most important for student 
modeling. The problem is the difficulty of incorporating the 
ability parameters and item parameters independently into 
deep-learning-based methods so as not to degrade prediction 
accuracy. This study addresses that problem. 


Recent studies of deep learning have shown that redundancy 
of parameters for training data reduces generalization error, 
contrary to Occam’s razor. The studies also clarify the rea- 
sons [9, 20, 21]. Based on state-of-the-art reports, this study 
proposes a novel Deep-IRT that models a student’s response 
to an item by two independent redundant networks: a stu- 
dent network and an item network. The proposed method 
learns student parameters and item parameters indepen- 
dently to avoid impairing the predictive accuracy. A student 
network employs memory network architecture to reflect dy- 
namic changes of student abilities as DKVMN does. There- 
fore, the ability parameters of the proposed method do not 
depend on each item characteristic. They have higher inter- 
pretability than those of Deep-IRT. Moreover, the proposed 
method employs both items and skills as inputs in a differ- 
ent mode of Gosh et al. (2020) [7]. Although Tsutsumi et 
al. previously proposed a Deep-IRT for test theory, it can- 
not be applied to KT because a student’s ability is constant 
throughout a learning process [32]. 


2. RELATED WORK 


2.1 Item response theory 

There are many item response theory (IRT) models [3, 18, 
34, 35, 41]. This subsection briefly introduces two-parameter 
logistic model (2PLM): an extremely popular IRT model. In 
2PLM, the probability of a correct answer given to item 7 by 
student 7 with ability parameter 0; € (—oo, co) is assumed 
as 


1 
1+ exp(—1.7a; (6; — b;))’ 


P;(0i) (1) 
where a; € (0,00) is the j-th item’s discrimination param- 
eter expressing the discriminatory power for student’s abil- 
ities, and b; € (—oo, 00) is the j-th item’s difficulty param- 
eter representing the degree of difficulty. 


2.2 Dynamic key-value memory network 

The salient feature of DKVMN is that it assumes N underly- 
ing skills and relations between the input (items). Underly- 
ing skills are stored in key memory M*® € R“*¢*. However, 
value memory M? € R%** holds abilities of underlying 
skills at time t. Here, d, and d; are tuning parameters. To 
express the j-th item, the input of DK VMN is a one-hot vec- 
tor q; € {0, ly, where J represents the number of items for 
which the j-th element is 1 and for which the other elements 
are zeroes. DKVMN predicts the performance of item 7 at 
time t as explained below. 


First, DKVMN calculates the attention, which indicates how 
strongly an item 7 is related to each skill as 


Bi? = Ww) q; ah 7 (Fu) (2) 
wu = Softmax (ia?) ; (3) 


where M/ represents a 1 th row vector and w;; signifies the 
degree of strength of the relation between skill | and item 
j addressed by a student at time t. In addition, W“ is 
the weight matrix and weight vector. +) is the bias vector 
and scalar. Next, student vector a) is calculated using the 
weighted sum of value memory. 


Oy) = 0 wa (Mi)" (4) 


Finally, it concatenates a) with ond ) and predicts correct 
probability P;; for an item j as 


0? = tanh (WO Jol, a] +7), (5) 
Py =o (W6? +7), (6) 
where M,; represents the J th row vector of M7, [-] is a 


concatenation of vectors, and o(-) represents the sigmoid 
function. Reportedly, DKVMN has the capability of accu- 
rately predicting performance. However, unfortunately, a 
lack of the interpretability of the parameter remains. 


2.3. Deep-IRT 
Deep-IRT is implemented by combining DKVMN with an 
IRT module [43] to improve the DKVMN interpretability. 


Deep-IRT exploits both the strong prediction ability of DK VMN 


and the interpretable parameters of IRT. Deep-IRT adds a 
hidden layer to DKVMN to gain the applicable ability and 
item difficulty. Specifically, when a student attempts item 7 
at time ¢, an ability ots ) and item difficulty fag ) are caleu- 
lated as shown below. 


of) = tanh (WO60) + 7%) , (7) 


69) = tanh (wa? rg a2) ' (8) 


The prediction is based on the difference between gs ) and 
BY such as IRT. 


Pye (3.0 «99 pe) (9) 


Here, ability 9) is calculated using we in equation (6), 
which depends on the item to solve because it implicitly as- 
sumes that items with the same skills are equivalent. In 
other words, the ability estimate for the same student and 
time might differ if the student attempts a different item. 
Furthermore, in equation (7), Deep-IRT uses item vector 
og ) to calculate 6, An important difficulty is that a stu- 
dent’s ability, which depends on each item, hinders the in- 
terpretability of the parameters. Although Tsutsumi et al. 
[32] also proposed a Deep-IRT as a test theory, the purpose 
is different from this study because it can not be available 
for KT as mentioned before. 
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Figure 1: Network architecture for Deep-IRT with indepen- 
dent student and item networks. The yellow components rep- 
resent the process of getting the attention weight. Also, the 
green components are associated with the student network 
and the process of updating the value memory. The blue 
components are associated with the item network. 


3. DEEP-IRT WITH INDEPENDENT STU- 
DENT AND ITEM NETWORKS 


To resolve the difficulty described above, this study proposes 
a novel Deep-IRT method comprising two independent neu- 
ral networks: the student network and Item deep network, 
as shown in Figure 1. The student network employs memory 
network architecture such as DKVMN to ascertain changes 
in student ability comprehensively. The item network in- 
cludes inputs of two kinds: the item attempted by a student 
and the necessary skills to solve the item. Using outputs 
of both networks, the probability of a student answering an 
item correctly can be calculated. 


The proposed method can estimate student parameters and 
item parameters independently such that prediction accu- 
racy does not decline because the two independent networks 
are designed to be more redundant than with earlier meth- 
ods , based on state-of-the-art reports [9, 20, 21]. The pro- 
posed method predicts P,;, the probability of a correct an- 
swer assigned to item 7 at time t, using the item difficulties 
and the student abilities, as follows. 


3.1 Item network 

In the item network, two difficulty parameters of item 7 
are estimated: the item characteristic difficulty parameter 
Biem and the skill difficulty 87,,,, to solve item j. The 
item characteristic difficulty parameter indicates the unique 
difficulties of the item, excepting the required skill difficulty. 
The proposed method expresses item difficulty as the sum 
of the two difficulty parameters of §?,,,,, and 82,.,)). 


As with DKVMN, to express the j-th item, an input of the 


item network is a one-hot vector qj € R’ as shown below. 


i f G =m) (10) 


0 (otherwise) 


Here, J stands for the number of items. The item network 
comprises n layers. The item characteristic difficulty pa- 
rameter of item 7 is calculated using a feed forward neural 
network as 


2 = tanh (w'q; + ae) ; (11) 
cH = tanh (wee + ao) ; (12) 
oe _ W item) Bi fe 7 Pitem) | (13) 


where k = {2,...,n}. The last layer Bo represents the j-th 


item 
item characteristic difficulty parameter. 


Similarly, to compute the difficulty of skills, the proposed 
method uses the input of necessary skills s; € R° as pre- 
sented below. 


iegs 1 (item j neuares skill m) (14) 
0 (otherwise) 
Here, S' represents the number of skills: 
+} = tanh (ws; + a) : (15) 
i = tanh (WO yf, +7), (16) 
ee = W (Berit) ay at Perit) (17) 


where k = {2,...,n}. The last layer eee denotes the diffi- 
culty parameter of the required skills to solve the j-th item. 


3.2 Student network 


In the student network, the proposed method calculates 64 
based on the past response history as 


N 
a? = S> Mri, (18) 
l=1 


where M; is a memory matrix holding a students’ latent 
knowledge state, which are estimated similarly to DKVMN. 
Next, an interpretable student’s ability vector 6%, is esti- 
mated as follows. Therein, n represents a number of hidden 
layers decided depending on the prediction accuracy of ac- 
tual data. 


of = tanh (WP OED 4+ 7) , (19) 


gD — wi! ol, (20) 


where k = {2,...,n}. As a difference between the proposed 
method and Deep-IRT, the proposed method does not mul- 
tiply the attention in equation (18). In addition, el) is 
not calculated using features of items such as equations (5) 
and (7). Therefore, the ability parameter vector 0‘) does 
not depend on each item. Namely, it is independent from 
the difficulty parameter. The value of which denotes the 
ability for the corresponding latent skill because it is inde- 
pendent of any item. Therefore, ee ) can be interpreted as 
a measurement model such as a multidimensional IRT [25]. 
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3.3. Prediction of student response to an item 
The proposed method predicts a student’s response prob- 
ability to an item using the difference between a student’s 
ability A) to solve item j at time t and the sum of two 
difficulty parameters (?,.,,, and 7,1). 


Pry =o (3.0% 0 — (Bhem + Ben) 21) 


After the procedure, the value memory is updated using 
c; based on the input gq; and actual performance such as 
DKVMN [46]. 


The loss function of the proposed method employs cross- 
entropy, which reflects classification errors. The cross-entropy 
of the predicted responses P;; and the true responses uz; is 
calculated as 


C(ur, Pi) = — 5 (way log Pry + (1 — wey) log(1 — Pay), 
t 

(22) 
where uz; is the true response to item 7 at time t. The 
student’s response uz; is recorded as 1 when the student 
answers the item correctly and 0 otherwise. All parameters 
are learned simultaneously using a well known optimization 
algorithm: adaptive moment estimation [15]. 


4. PREDICTIVE ACCURACY 
4.1 Datasets 


We conduct experiments to compare the performance of our 
approach against existing solutions. This section presents 
comparison of the prediction accuracies for student perfor- 
mance of the proposed method with those of earlier methods 
(DKT, DKVMN, Deep-IRT, AKT) using four benchmark 
datasets as ASSISTments2009', ASSISTments20157, Stat- 
ics2011 *, KDDcup*. ASSISTments2009 and KDDcup have 
item and skill tags, although most methods explained in 
the relevant literature adopt only the skill tag as an input. 
However, methods with skill inputs rely on the assumption 
that items with the same skill are equivalent [7]. That as- 
sumption does not hold when an item’s difficulties in the 
same skill differ greatly. Therefore, as inputs to AKT and 
the proposed method, we employ not only skills but also 
items.ASSISTments2015 has only the skill tag. Therefore, 
we employ only the skill tag as an input. 


Table 1 presents the number of students (No. Students), 
the number of skills (No. Skills), the number of items (No. 
Items), the rate of correct responses (Rate Correct), the 
average length of items which students addressed (Learning 
length), and the rate of items in which the number of student 
addressed is less than 10 (Sparsity). For all the datasets, 
we excepted students who addressed fewer than five items. 
Additionally, we set 200 items as the upper limit of the input 
length according to an earlier study [43]. When the input 
length of items becomes greater than 200, we use the first 
200 response data for all methods. 


‘https: //sites.google.com/site/assistmentsdata/home/ 
assistment-2009-2010-data 

“https: //sites.google.com/site/assistmentsdata/home/2015- 
assistments-skill-builder-data 

3https: / /pslcdatashop.web.cmu.edu/DatasetInfo?datasetId 
=507 


‘https: //pslcdatashop.web.cmu.edu/KDDCup/downloads.jsp 
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Figure 2: AUC and the Number of Layers 


4.2 Hyperparameter selection and evaluation 
We used ten-fold cross-validation to evaluate the prediction 
accuracies of the methods. The item parameters and the 
hyperparameters are learned using 70% of datasets. Given 
the estimated hyperparameters, a student’s ability can be 
estimated at each time using the remaining 30% of each 
dataset. For all methods, the hidden layer size and memory 
dimension are chosen from {10, 20,50, 100, 200} using cross- 
validation. In addition, for the earlier methods, we used the 
hyperparameters reported from earlier studies [7, 43]. 


To ascertain the number of layers n for the proposed method, 
we conducted some experiments to gain experience using 
ASSISTments2009 while changing the layer number. The 
results are presented in Figure 2. As shown in the figure, 
AUC score reaches its highest level when n = 2 and n = 4. 
Based on this result, we employ n = 2 for the following 
experiments because the computation time of the proposal 
increases exponentially as the number of layers increases. 


If the predicted correct answer probability for the next item 
is 0.5 or more, then the student’s response to the next item 
is predicted as correct. Otherwise, the student’s response 
is predicted as incorrect. For this study, we leverage three 
metrics for prediction accuracy: Accuracy (Acc) score, AUC 
score, and Fl score. The first, Acc, represents the con- 
cordance rate between the student predictive performance 
and the true performance. The second, AUC, represents 
the predictive accuracy of the correct answer probabilities. 
F 1 indicates the average of the F1 score of incorrect answer 
prediction and the F1 score of correct answer prediction. 


4.3 Results 

The respective values of Acc, AUC, and F1 for those bench- 
mark datasets are shown in Table 2. Results show that 
the proposed method with item and skill inputs provides 
the best performance for the metrics: averages of Acc and 
F1. Especially noteworthy is that the proposed method out- 
performs AKT, which is the most advanced method. Fur- 
thermore, the proposed method with item and skill inputs 
provides better performance than that with skill or item in- 
puts. These results indicate that parameter estimation, not 
only with skill but also with item, improves the predictive 
accuracy. 
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Table 1: Summary of Benchmark Datasets 


Dataset No. students No. skills No. Items Rate Correct Learning Length Sparsity 
ASSIST 2009 4,151 111 26,684 68.0% 70.8 55.2% 
ASSIST 2015 19,840 100 N/A 73.2% 34.2 12.6% 
Statics2011 333 156 1,223 77.7% 180.9 2.6% 

KDDcup 820 43 476 78.3% 11.9 57.8% 

Table 2: Predictive Accuracy of Student Performance with Benchmark Datasets 
DKT DKVMN_ Deep-IRT AKT AKT Proposed Proposed 
(item&skill) (item&skill) 

Acc | 0.759 0.763 0.768 0.692 0.755 0.768 0.765 

ASSIST2009 | AUC | 0.781 0.807 0.806 0.717 0.811 0.818 0.810 
F1_ | 0.697 0.714 0.718 0.639 0.726 0.725 0.722 

Acc | 0.754 0.749 0.747 0.757 N/A 0.752 N/A 

ASSIST2015 | AUC | 0.730 0.732 0.727 0.760 N/A 0.751 N/A 
F1 | 0.4383 0.541 0.540 0.616 N/A 0.543 N/A 

Acc | 0.769 0.805 0.817 0.809 0.818 0.819 0.822 

Statics2011 | AUC | 0.666 0.819 0.822 0.821 0.827 0.821 0.821 
Fl | 0.483 0.679 0.681 0.690 0.677 0.679 0.690 

Acc | 0.784 0.773 0.792 0.774 0.780 0.786 0.802 

KDDcup AUC | 0.538 0.594 0.588 0.606 0.610 0.588 0.610 
F1 | 0.439 0.439 0.455 0.441 0.449 0.469 0.478 

Acc | 0.767 0.773 0.781 0.758 0.784 0.781 0.796 

Average AUC | 0.679 0.738 0.736 0.726 0.749 0.745 0.747 
Fl | 0.513 0.593 0.599 0.597 0.617 0.604 0.630 


However, AKT with item and skill inputs shows the best 

average values of AUC. Actually, AKT with item and skill 

inputs also provides higher performance than that achieved 

with skill or item inputs, as shown in [7]. Gosh et al. (2020) 

reported that AKT is more effective for large datasets. There- 
fore, AKT provides the best performance for all the metrics 

of ASSISTments2015, which has an extremely large number 

of students. 


Furthermore, surprisingly, the averages of ACC, AUC, and 
F1 obtained using the proposed method with skill input are 
better than Deep-IRT, although the proposed method sepa- 
rates student and item networks. This result implies that re- 
dundant deep student and item networks function effectively 
for performance prediction. These results are explainable 
from reports of state-of-the-art methods [9, 20, 21]. 


The performance results obtained using DKVMN are almost 
identical to those obtained using Deep-IRT because they 
have almost identical network structures. Results show that 
DKT provides the worst performance among the methods 
studied here. 


5. PARAMETER INTERPRETABILITY 


5.1 Interpretability of difficulty parameters 

To evaluate the interpretability of the difficulty parameters 
of the proposed method, we compare the parameters of IRT 
with those of Deep-IRT using a simulation data. The dataset 
includes 2000 students’ responses to 50 items and it is gen- 
erated from 2PLM as shown in equation (1). The priors of 
the parameters have 8 ~ N(0,1),a ~ LN(0,1),b ~ N(0,1). 
We estimated the parameters of the proposal and Deep-IRT 
using the dataset. Table 3 shows the Pearson correlation 
between the true parameters of the true models and the es- 
timated parameters, respectively, of the proposed method 
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Table 3: Pearson correlation 


parameter | Deep-IRT Proposed 
difficulty 0.611 0.886 
accuracy 0.694 0.695 


and Deep-IRT. Additionally, we show the prediction accu- 
racies of the proposed method and Deep-IRT for the dataset. 
The proposal provides higher correlations with true parame- 
ters than Deep-IRT does, whereas the proposed method has 
higher accuracy than Deep-IRT has. The results demon- 
strate that the two independent networks of the proposed 
method function effectively for the interpretability of the 
estimated parameters and for the prediction accuracies. 


5.2 Student ability transitions 

This section shows student ability transitions using the pro- 
posed method. Visualizing the ability transition for each 
skill is helpful for both students and teachers because they 
can discover student strengths and weaknesses and can im- 
prove the learning method to fill in the learning gaps. Ye- 
ung [43] demonstrated a student ability transition for each 
skill using Deep-IRT. However, their results included some 
counter-intuitive ability estimates. For example, even when 
the student answered incorrectly, the corresponding student 
ability estimate increased. Moreover, Deep-IRT cannot iden- 
tify a relation among multidimensional skills. There are 
cases in which a student’s ability for low-level skills decreases 
even when the student responds correctly to items for high- 
level skills. These unstable behaviors of Deep-IRT might 
engender serious difficulties, which will consequently confuse 
students and teachers, as a student model. 


Figure 3 depicts a student’s ability transitions of the pro- 
posal for the ASSIST 2009 dataset. The vertical axis shows 
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Figure 3: An example of a student ability transition from the ASSIST2009 dataset. The skill tags are classified respectively as 
equation solving two or fewer steps (blue), ordering fractions (orange), finding percents (green), and equation solving more than 
two steps (red). The student responses to items are shown at the bottom of the graph. 


the student ability on the left side, with the student’s re- 
sponse to an item on the right side. The horizontal axis 
shows the item number. The student’s response is 1 when 
the student answers the item correctly; it is 0 otherwise. The 
student attempted skills of "equation solving more than two 
steps” (shown in red), "equation solving two or few steps” 
(shown in blue), "ordering factions” (shown in orange), and 
“finding percents” (shown in green). Figure 3 can be inter- 
preted as explained below. 


1. Theta 1 decreases when the student responds to item 2 
“ordering factions” (orange) incorrectly and it increases 
when the student responds to item 3 correctly. There- 
fore, theta 1 indicates the ability of ”ordering factions”. 


2. Items 6-17 correspond to the skill of ”equation solving 
two or few steps”(blue). Theta 2 indicates the ability 
of "equation solving two or few steps” because theta 2 
greatly increases while the student answers correctly. 


3. For the skill of ”finding percents” (green), the student 
answers all items incorrectly. Theta 3 indicates the 
ability of "finding percents” (green) because it greatly 
decreases in items 18-24. 


4. Items 4, 5, and 25-30 correspond to the skill of ”equa- 
tion solving more than two steps” (red). Theta 4 de- 
creases when the student answers to item 4 and 5 in- 
correctly, and increases when the student answers to 
items 26-29 correctly. Therefore, theta 4 represents 
the ability of ”equation solving more than two steps” 


(red). 


Figure 3 shows that the proposed method estimates the abil- 
ity of each skill to reflect the student responses. Addition- 
ally, it estimates relations among the skills. Therefore, when 
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a student responds to an item correctly/incorrectly, not only 
does the corresponding skill ability increase/decrease; those 
for other skills increase/decrease as well. Consequently, the 
results demonstrate that the proposed method improves both 
the interpretability and the prediction accuracies of Deep- 
IRT. 


6. CONCLUSIONS 


This study proposed a novel Deep-IRT that models a stu- 
dent’s response to an item by two independent redundant 
networks: a student network and an item network. Because 
two independent redundant neural networks are used, the 
parameters of the proposed method can be highly inter- 
preted with keeping hight prediction accuracy. Moreover, 
the proposed method employs both items and skills as in- 
puts. Experiments demonstrated that the proposed method 
with item and skill inputs provided the best performance for 
the metrics: averages of Acc and Fl. deep-learning-based 
methods. The result also showed AKT with item and skill 
inputs provided the best average values of AUC. Especially, 
AKT provided the best performances for large datasets as 
Gosh et al. (2020) reported [7]. In addition, results of ex- 
periments show that the parameters of the proposed method 
are more interpretable than those of Deep-IRT. This study 
employed slightly redundant deep networks compared to ear- 
lier methods. As future work, we intend to use the proposed 
method to investigate the performances of more redundant 
and deeper networks. In addition, we will try to optimize a 
forgetting function for past data to maximize the prediction 
accuracy for large data sets. 
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